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Topological network motifs represent functional relationships within and between regulatory and protein-protein interaction 
networks. Enriched motifs often aggregate into self-contained units forming functional modules. Theoretical models for network 
evolution by duplication-divergence mechanisms and for network topology by hierarchical scale-free networks have suggested 
a one-to-one relation between network motif enrichment and aggregation, but this relation has never been tested quantitatively 
in real biological interaction networks. Here we introduce a novel method for assessing the statistical significance of network 
motif aggregation and for identifying clusters of overlapping network motifs. Using an integrated network of transcriptional, 
posttranslational and protein-protein interactions in yeast we show that network motif aggregation reflects a local modularity 
property which is independent of network motif enrichment. In particular our method identified novel functional network themes 
for a set of motifs which are not enriched yet aggregate significantly and challenges the conventional view that network motif 
enrichment is the most basic organizational principle of complex networks. 
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1 Introduction 

Reconstructing the organizational principles that determine 
the structure and function of regulatory and protein-protein in- 
teraction networks is a key challenge of network biology. Net- 
work motifs, small subgraphs occuring significantly more of- 
ten than expected by chance, have been proposed as the basic 
building blocks of complex networks^ ^, including integrated 
networks composed of multiple types of interactions^"^] jn 
transcriptional regulatory networks, network motif s are known 
to aggregate into larger, self-contained units This con- 
cept was extended to integrated networks and resulted in the 
definition of 'network themes', frequently recurring higher- 
level patterns of overlapping network motifs^, which charac- 
terize the structure of functional modules Several studies 
have further investigated the connection between network mo- 
tif enrichment and aggregation, from a topological as well as 
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from an evolutionary perspective. In hierarchical scale-free 
random networks the enrichment and aggregation of a cer- 
tain class of subgraphs are intimately related to each other and 
to the global topological network parameters'^. Furthermore 
these subgraphs tend to aggregate around network hubs ^ ^ . A 
comparative phylogenetic analysis of genes within motifs has 
shown that they are not subject to any evolutionary pressure 
to preserve the motif pattern A likely reason is that the 
motifs aggregate and cannot be considered in isolation 
In a simple duplication-divergence model for network growth, 
modularity, accompanied by subgraph abundance, can appear 
for free without selection pressure On the other hand, in 
a model of evolving electronic circuits, modularity emerges 
only in an environment that changes itself in a modular man- 
ner, while network motif enrichment appears only if the mod- 
ularly varying goal contains information-processing tasksEl. 

An important question that has not been addressed before is 
whether the aggregation of a motif is indeed surprising or sig- 
nificant, given the number of motif instances that have to fit 
on a network with a fixed degree distribution, and if so, how 
such aggregation relates to motif enrichment and whether any 
functional interpretation can be given to it. Here we address 
this question using random network ensembles which preserve 
the degree distribution as well as the total motif count, a new 
network motif aggregation statistic, and a novel algorithm for 
identifying clusters of overlapping motifs to assess in a quan- 
titative way the enrichment and aggregation significance of 
all composite motifs in a network which integrates transcrip- 
tional^ and posttranslationaP regulatory interactions as well 



This journal is ©The Royal Society of Chemistry [year] 



Journal Name, 201 0, [vol], 1 -[12] | 1 



as physical protein-protein interactions^ in yeast. 



2 Results 



2.1 Molecular interaction networks in yeast deviate from 
the hierarchical scale-free model 

The most detailed study of the relation between network mo- 
tif enrichment and aggregation to date has been done for 
so-called hierarchical scale-free random networks, which are 
characterized by a power-law degree distribution P{k) ~ k~^, 
where P{k) is the probability for a node to have k neighbors 
(irrespective of edge direction), and a power-law scaling for 
the clustering coefficient C{k) ^ where C{k) is the av- 
erage clustering coefficient for a node with k neighbors 
Hierarchical scale-free random networks share with biologi- 
cal networks the property that they are organized into many 
small, highly connected modules that combine hierarchically 
into larger, less cohesive units The hierarchical scale- 
free model predicts that highly abundant network motifs al- 
ways aggregate into larger motif clusters centred around net- 
work hubs in order to distribute a large number of motifs over 
a comparatively small number of nodes ^ ^ . 

Although Vazquez et al showed that several biological 
networks could be approximated by the hierarchical scale- 
free model, new data on these networks has accumulated since 
then. We calculated the degree distribution P{k) and cluster- 
ing coefficient distribution C{k) for the transcriptional^^ and 
posttranslational^ regulatory networks as well as for the phys- 
ical protein-protein interaction network in yeast (Figure [T]). 
For all three networks the degree distribution fits well to a 
power-law (correlation coefficient R = 0.89, 0.87, 0.95 respec- 
tively), with deviations mainly at the (relatively few) high- 
degree nodes or hubs. The clustering coefficient distribution 
however shows significant deviation from power-law scaling 
already at medium-degree nodes (R = 0.61,0.74,0.30 respec- 
tively). In other words, many medium-degree nodes in these 
networks have a significantly higher clustering coefficient than 
expected from the hierarchical scale-free model, suggesting 
the presence of an additional organizational level. 

2.2 A network motif aggregation statistic to quantify lo- 
cal modularity 

We hypothesized that deviations of the clustering coefficient 
distributions from power-law behavior are due to a local ag- 
gregation of network motifs around specific nodes which are 
not necessarily hubs. To quantify this aggregation of network 
motifs, we made the following considerations. If two net- 
works share the same number of instances of a given motif, 
then the motif is more aggregating in the network where fewer 
nodes participate in one of the motif instances. Conversely, if 



A. Transcriptional regulatory network 



B. Posttranslational regulatory network 



C. Protein-protein interaction network 




Fig. 1 Degree distribution P{k) (left) and clustering coefficient C{k) 
(right) as a function of the degree k for three molecular interaction 
networks in yeast. In the upper right corner of each figure, the 
correlation coefficient R of the data to the best power-law fit is given. 
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two networks have the same number of nodes participating in 
motif instances, the motif is more aggregating in the network 
which has the highest number of motif instances among these 
nodes. We defined a network motif aggregation statistic 5^ for 
any three-node motif, having exactly these properties, as the 
ratio 

(!) 

where N is the total number of motif instances and n\,n2 and 
are the number of network nodes which participate at least 
once in a motif at each of the three possible motif nodes (see 
Methods for details). 

We used this statistic to compare the real networks to ran- 
domized networks which preserve the in- and out-degree dis- 
tributions as well as the total number of instances of the input 
motif of the real network. This random network ensemble is 
different from the usual one which only preserves the degree 
distributions. By adding the constraint to also preserve the to- 
tal motif count, we ensure to assess aggregation of network 
motifs independent of their abundance or enrichment. We say 
a network exhibits local modularity (as opposed to hierarchi- 
cal modularity) with respect to a certain motif if its aggrega- 
tion statistic is significantly higher in the real network than in 
the randomized networks. 

2.3 Feedforward loop aggregation in yeast regulatory 
networks is independent of enrichment 

We first considered the feedforward loop (FFL) in the tran- 
scriptional and posttranslational regulatory network. The tran- 
scriptional FFL is undoubtedly the best studied network motif 
and its functional role and aggregation have been described in 
several studies^ ^ ^ 9 19 20 y\ '\<^ strongly enriched (P < 0.001) 
in our network and also significantly aggregating (P < 0.001). 
Interestingly, the FFL is not at all enriched in the posttrans- 
lational network (P > 0.999), where it in fact occurs signifi- 
cantly less often than expected by chance. However the post- 
translational FFL is strongly aggregating (P < 0.001). This 
result already indicates that network motif enrichment and ag- 
gregation are not in one-to-one relation like in the hierarchical 
scale free model and that the deviations of the clustering coef- 
ficient distributions from the hierarchical scale-free model in 
both networks (Figure [T]) are well represented by the network 
motif aggregation statistic. 

2.4 Composite network motifs also exhibit local modu- 
larity independent of their enrichment 

An additional reason why the clustering coefficient distribu- 
tions may deviate from the hierarchical scale-free model is 
the fact that the transcriptional, posttranslational and protein- 
protein interaction network do not exist in isolation but are in- 



tertwined with each other. Network motifs composed of mul- 
tiple interaction types represent the functional relationships 
between different levels of regulation in a cell™. Hence 
we examined the enrichment and aggregation of all three- 
node composite motifs which occur at least 100 times in 
the integrated network of transcriptional, posttranslational and 
protein-protein interactions (Figure [2]). There appears to be 
no strong relation between network motif enrichment and ag- 
gregation and in particular, there are several examples of mo- 
tifs which are not enriched yet display significant aggregation 
(Figure [2]\). The Spearman rank correlation between the en- 
richment and aggregation Z- scores is 0.55, indicating that both 
properties are at best weakly correlated to each other (Figure 

2.5 An algorithm to identify local clusters of overlapping 
network motifs 

To analyze in more detail the functional role of network mo- 
tif aggregation, we developed an algorithm to identify net- 
work motif clusters. Kashtan et al.^ introduced the concept 
of topological motif generalizations which consist of perfect 
motif replications along one of the motif nodes (Figure |3]\). 
To allow for the possibility of imperfect networks with miss- 
ing interactions, we further generalized this concept and de- 
fined motif clusters as subnetworks which locally maximize 
the aggregation statistic ^ (cfr. eq. ([T])). In a motif cluster, 
each motif node / corresponds to a 'node role' ^ and is repli- 
cated into a set of cluster nodes Xf (Figure |3p). The aggre- 
gation score of a cluster is defined as the aggregation statistic 
restricted to the subnetwork formed by Xi , X2 and X3 . To find 
high- scoring clusters, we defined cluster membership weights 
for each node role, similar to spectral weights for matrices'^D 
such as the PageRank— or hub- and authority weights: a 
node gets a high weight in role 1, if it belongs to many motif 
instances together with nodes which have high weights in role 
2 and 3, and similarly for the other roles. This yields a set 
of multilinear equations in the membership weights for each 
role which are easily solved numerically. After taking a suit- 
able threshold on the weight vectors a high- scoring cluster is 
obtained. The algorithm continues in an iterative fashion by 
removing from the network all motif instances assigned to the 
previous cluster and repeating the procedure until no more in- 
stances remain. Since motif instances are partitioned, nodes 
and edges can belong to multiple clusters. We refer to the 
Methods section for more details. 



2.6 Comparison with Zhang et al 



Zhang et al. ^ defined network themes as classes of motif clus- 
ters based on visual inspection of composite network motifs. 
Our definition of local modularity on the other hand is based 
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A. Locally modular composite network motifs 

Motif Z Z N Motif class and functional theme 
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TR feedforward loop: Various functions 




Coregulated interacting proteins: 
Coregulated protein networks, various functions 




Copointing interacting regulators: Various functions 




Protein-interaction mediated regulatory loop: 
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regulation of/by protein complexes 




Posttranslational feedforward loop: 
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B. Non-modular composite network motifs 
Motif Z Z N Motif class 



(1) 
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-1.39 1734 


Coregulated interacting proteins 



(2) 
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1.98 -0.29 222 


Feedforward loops 



(3) 






A 


-1.67 131 


Feedback loops 



C. Scatterplot of ranked enrichment vs. aggregation Z-scores 




10 12 



Fig. 2 Significant locally modular (A) and non-modular (B) composite network motifs in the integrated yeast network, organized by common 
motif classes and functional themes. Shown is for each motif the aggregation Z-score (Za), the enrichment Z-score (Zg) and the number of 
instances (N), and for each functional theme an example of a motif cluster. Significant Z-scores (P < 0.005) are highlighted in orange. 
Interaction color legend: TR, PTL and PPI interactions respectively in blue, green and red. C. Scatterplot of Z^ vs. Za ranks (in ascending 
order) for all motifs in panel A and B. 
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Network motif clustering 
A. Topological motif generalization B. Motif cluster 





000© 'i^^Q 



Fig. 3 A. Example of a topological motif generalization where all 
possible motif instances (in this case FFL) between the nodes are 
present. B. Example of a motif cluster (in this case FFL) with a high 
aggregation score (high number of motif instances relative to the 
number of nodes in Xi , X2 and X3). 



on the significance of the network motif aggregation statistic 
and provides an unbiased and rigorous method for identifying 
network themes. Three of the network themes discovered by 
Zhang et ah ^ pertain to the networks studied here, namely the 
transcriptional feedforward loop (Figure[2j\(l)), the transcrip- 
tionally coregulated interacting proteins motif (Figure |2|\(2)) 
and the copointing interacting transcription factors motif (Fig- 
ure [2]\(3)). In all three cases our method found a highly sig- 
nificant aggregation Z-score, confirming the validity of our ap- 
proach. 

Having furthermore an automated clustering algorithm al- 
lows to identify functional themes associated to each lo- 
cally modular network motif. For instance, among the func- 
tional categories enriched in transcriptional FFL clusters, we 
find mainly the core processes associated with transcription 
such as transcriptional control, DNA binding and regulation 
of metabolic processes (Supplementary Table SI), support- 
ing the hypothesis that transcriptional FFLs play a univer- 
sal information-processing role^^. For the transcriptionally 
coregulated interacting proteins motif, it is usually assumed 
that enrichment and clustering reflects a 'regulonic complex' 
theme in which transcriptionally coregulated interacting pro- 
teins are often members of a protein complex^ ^ We found 
that high-scoring coregulated protein clusters sometimes over- 
lap with known protein complexes (Supplementary Table S2), 
but more often form 'functional protein networks' (Figure 
[2]\(2)): subnetworks of the PPI network enriched for a par- 
ticular function and identified by overlaying the protein inter- 
action network with an additional layer of information, in this 
case regulator-target data. Functional coregulated protein net- 
works can be of practical interest to generate detailed hypothe- 
ses for the different functions in which a particular regulator 
is involved. For instance ABFl is a multifunctional global 



regulator, but its set of targets in the transcriptional network 
is only enriched for tRNA synthesis. Network motif cluster- 
ing on the other hand identifies protein networks regulated by 
ABFl enriched for several more categories, many of which 
are consistent with current knowledge, such as general DNA 
binding function, regulation of ribosome biosynthesis!^, nu- 
clear transport etc. (Supplementary Table S3). Interest- 
ingly, the posttranslationally coregulated interacting proteins 
motif is not significantly aggregating, although it is signifi- 
cantly enriched (Figure |2^(1)). This means that targets reg- 
ulated by the same kinase physically interact more often than 
randomly selected proteins, but the resulting coregulated pro- 
tein networks are not more dense than expected by chance. 
This result is consistent with the fact that protein complexes 
can be posttranslationally regulated by regulating just one in- 



stead of all of its components (see also Section [ZS] ). 

Transcription factors often function as a complex and the 
binding sites for these transcription factors occur more fre- 
quently within the same promoter regions leading to a 'co- 
pointing' theme ^. The copointing interacting regulators motif 
is significantly aggregating at transcriptional and posttransla- 
tional level (Figure |2j\(3)). At the transcriptional level, co- 
pointing interacting regulator pairs include well-known co- 
operating transcription factors like the cell cycle transcrip- 
tion factors SWI4-SWI6-MBP1, the ribosomal protein reg- 
ulators RAPl-FHLl, the galactose response regulatory com- 
plex GAL3-GAL80, and others. At the posttranslational 
level, many clusters come from the multi-functional cyclin- 
dependent kinases (CDK) CDC28 or PH Q85 complexed with 
one of their many cyclin activators^^^SI 

We conclude that highly abundant network motifs which are 
strongly enriched as well as aggregating likely have a univer- 
sal information-processing role which extends across various 
functional categories. 

2.7 Posttranslational feedforward loop aggregation re- 
flects a cell-cycle regulation theme 

Perhaps the most surprising finding in our analysis is the exis- 
tence of network motifs which aggregate significantly but are 
not enriched (Figure [2]). We hypothesize that local modularity 
of a network motif without significant enrichment indicates 
that this motif is important for specific biological functions 
but does not play a universal role like the strongly enriched 
motifs in the previous section. Two examples of such motifs 
are the posttranslationally controlled feedforward loops (Fig- 
ure[2]\(5)). 

More than half of the posttranslational FFLs and mixed 
posttranslational-transcriptional FFLs belong to clusters reg- 
ulated by the multi-functional CDKs CDC28 or PH085 and 
these clusters are indeed often enriched for cell-cycle related 
functions (Supplementary Tables S4 and S5). CDC28 is the 
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central coordinator of the yeast cell cycle'^ and it has been 
shown that the mixed posttranslational-transcriptional FFLs 
regulated by it are important transducers between cell cy- 
cle regulatory signals and responses, using dynamical models 
for individual motif instances^]. Our approach on the other 
hand reveals the overlapping nature of these motifs. For in- 
stance, cluster 1 (depicted in Figure [2}\(5)) contains three 
transcription factors functioning in Gl/S transition (SWI4, 
SWI6, STBl), one in G2 phase (FKH2) and one in G2/M 
transition (NDDl). The complexity of the overlapping motif 
structure is further emphasized by the fact that NDDl not only 
functions as a transcriptional transducer of the cell cycle sig- 
nal, but also as a response target of the four other transcription 
factors. The target proteins in this cluster are enriched for sev- 
eral cell-cycle related functions (Supplementary Table S 5) and 
eleven of the sixteen targets are periodically expressedlSlSl. 
PH085 is another CDK with a multifunctional role in cell 
cycle control and other processes^Sl. Like CDC28 it is acti- 
vated by a large family of cyclins. The transcription factors 
associated to clusters regulated by PH085 contain the PH085 
substrates PH04, GCN4 and SWI5 whose phosphorylation is 
important for the role of PH085 in regulating environmental 
signalling response and the cell cycle This suggests that 
the posttranslational-transcriptional FFL plays a similar dy- 
namical role in transducing PH085 regulatory signals as for 
CDC28^i. 



The cell-cycle is a complex process and cell-cycle kinases 
often also interact physically with their target substrates. As 
a result there are two motifs involving all three interaction 
types (Figure [2]\(5)) which almost all overlap with mixed 
posttranslational-transcriptional FFLs. The aggregation sig- 
nificance is consistent across all four posttranslational FFLs 
but the enrichment is not (Figure|2j\(5)). This is not in contra- 
diction with the previous result that enriched locally modular 
motifs play a universal role, since the vast majority of triple- 
interaction motifs involve cell-cycle regulators, i.e. 'universal' 
always refers to the network at hand. 



In summary, we can say that the posttranslational regula- 
tory network exhibits an overall lack of feedforward loops, 
presumably because it operates on a much shorter timescale 
than the transcriptional regulatory network to elicit fast 
information-processing responses, typically in the form of sig- 
naling cascades Posttranslational feedforward loops (pure 
as well as composite) do seem to play an important role how- 
ever in regulation of the cell-cycle, and this 'local' role is re- 
flected in a significant aggregation of these motifs around the 
core CDKs CDC28 and PH085. 



2.8 Transcriptional and posttranslational protein 
interaction-mediated regulatory loop aggregation 
reflects a regulatory protein complex theme 

Another motif which is not enriched yet displays significant 
aggregation is the protein-interaction mediated transcriptional 
regulatory loop (Figure |2j\(4)), a circuit that is thought to 
serve for feedback mechanisms between a regulator-targ et p air 
via a common partner in the protein interaction networkEl. In 
the equivalent posttranslational motif all interactions can oc- 
cur simultaneously and its proposed function is that of a 'scaf- 
fold motif where the biochemical interaction between the reg- 
ulator and its target substrate is enabled by the common inter- 
actor^. A natural cluster generalization of such feedback or 
scaffold circuits is a 'regulonic star' , where multiple targets of 
a regulator ('spokes') interact with the same feedback or scaf- 
fold mediator ('hub'). Our algorithm identified several such 
modules as high-scoring motif clusters (Supplementary Table 
S6 and S7). For instance transcriptional cluster 1 1 consists of 
ABFl, a DNA binding protein that regulates multiple nuclear 
events, regulating a set of ten nuclear transport genes which 
all interact with PSEl, a nuclear transport receptor which also 
interacts with ABFl. A link between ABFl and the nuclear 
transport machinery via PSEl is knownl^S. Transcriptional 
cluster 14 has HSP82 as the hub protein. HSP82 is one of two 
yeast genes encoding for HSP90, a protein folding chaperone 
which plays a central role in various aspects of cellular signal- 
inglSll6 Binding of HSP90 to HAPl, the regulator of cluster 
14, is necessary for heme activation of HAPl This cluster 
may represent a feedback mechanism since TAHl, a cof ac- 
tor of HSP90^^, is one of its spoke proteins. Posttranslational 
cluster 43 is an example of a scaffolding regulonic star. It con- 
sists of the mitotic B-type cyclin CLB2 which phosphorylates 
nine proteins involved in budding, cell polarity and filament 
formation, which all interact with NAPl, a protein which is 
known to interact with and facilitate the function of CLB2'S1. 

We also found a relation between the protein-interaction 
mediated regulatory loops and protein complexes in the form 
of a 'regulatory interacting (RI) double-star' cluster type, con- 
sisting of one or a few regulator-target pairs which share a 
common set of partners in the protein interaction network. 
Usually the spoke proteins in such a RI double- star mutually 
interact and form the components of a protein complex, of- 
ten together with the hub protein (Supplementary Tables S8 
and S9). For instance, in posttranslational cluster 32, SSN3 
(also called SRBIO) phosphorylates MED2, a component of 
the RNA polymerase II Mediator complex, and both interact 
with eight other Mediator components and the transcription 
factor YAPl (depicted in Figure |2j\(4)). It is known that post- 
translational modification of Mediator components affects its 
function, and that SRBIO is part of a module whose bind- 
ing to the Mediator complex determines if Mediator can as- 
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sociate with pol II or not^. YAPl, a bZIP transcription factor 
required for oxidative stress tolerance, is related to the Me- 
diator complex via a transcriptional RI double- star in which 
25 components of the Mediator complex interact with YAPl 
and SRB6 and SRB7, two other components of the Media- 
tor complex that are transcriptionally regulated by YAPl. The 
Mediator complex acts as a bridge between gene-specific tran- 
scription factors and the basal pol II transcription machinery 
and mutants of the general transcription factor TFIIA are un- 
able to grow in conditions that require the oxidative stress re- 
sponse!^. Another example of a transcriptional RI double- 
star is transcriptional cluster 20 where HAP4 regulates GCN4 
and both interact with five components of the SWI/SNF com- 
plex, which regulates transcription by nucleosome remodel- 
ing. HAP4 and GCN4 are two transcriptional activators which 
target SWI/SNF to appropriate promoters 

The protein complexes which appear in transcriptional reg- 
ulatory interacting double- stars are all regulatory complexes 
involved in the different steps from chromatin remodelling to 
transcription, translation and posttranslational control (Sup- 
plementary Table S8). In cases where the hub protein also 
belongs to the complex, this suggests that the RI double-star 
acts like a two-node feedback loop in which the transcription 
factor regulates the complex by regulating one or a few of 
its components, and in turn the complex regulates the tran- 
scription factor by interacting with it at the protein level. A 
composite feedback mechanism using a slow (transcriptional) 
and fast (protein-protein interaction) timescale is known to en- 
hance stability around a steady state For the posttransla- 
tional motif on the other hand, RI double-star clusters reflect 
regulation of the protein complex, and hence we find a much 
broader range of protein complexes besides regulatory com- 
plexes (Supplementary Table S9). This more universal role is 
again consistent with the fact that the posttranslational motif 
is strongly enriched but the transcriptional is not. The inter- 
pretation of a posttranslational RI double-star cluster is that 
the kinase and protein complex form a two-component loop 
where the complex acts as a scaffold protein for its own regu- 
lation. 

Regulatory interacting double- star clustering of protein- 
interaction mediated regulatory loops induces a higher-level, 
global map of protein complex regulation. In Figure[4]we con- 
sidered all protein complexes which overlap significantly with 
RI double- star clusters with at least three (transcriptional) or 
four (posttranslational) spoke proteins. This map shows a high 
amount of two-component regulator-complex feedback loops, 
with a central role for the Mediator complex and the nucleo- 
somal proteins. Interestingly, transcriptional and posttransla- 
tional regulation are heavily intertwined in this map. Some 
complexes (Mediator, small ribosomal subunit, nucleosomal 
proteins) are regulated by transcriptional as well as posttrans- 
lational RI double stars, while others (SrblOp, SLIK) play 



a feedback or scaffolding role for transcriptional as well as 
posttranslational regulatory interactions. Figure [4] provides a 
novel kind of coarse grained integrated network representation 
which complements previous thematic maps of compensatory 
and regulonic complexes^. 

The protein-interaction mediated transcriptional regulatory 
loop has previously been found enriched^ or not enriched^ 
in different datasets. We calculated the aggregation statistic 
in these two datasets as well as for a network of literature- 
curated protein-protein interactions'^, in which the motif is 
also enriched, and found in all cases a consistent statistically 
significant aggregation (data not shown). The protein inter- 
action networks where the motif is enriched are targeted to- 
wards co-complex interactions, while the networks were it is 
not enriched also contain interactions derived from yeast two- 
hybrid studies. Thus we find that network motif aggregation 
is a more robust property than network motif enrichment and 
that the enrichment in co-complex based networks can be ex- 
plained by the relation between the protein-interaction me- 
diated transcriptional regulatory loop and protein complexes 
as detailed in the previous paragraphs. Our results also high- 
light the advantage of using an unbiased clustering approach 
to identify network themes, since Zhang et al. ^ did consider 
the transcriptional protein interaction-mediated loop but found 
no theme or cluster generalization for it. 

3 Conclusions 

Network motifs form the basic building blocks of complex 
networks and previous studies have suggested an intimate re- 
lation between the enrichment of a motif and its tendency to 
aggregate into functional modules. Evolutionary studies hy- 
pothesized that network motif clusters have evolved by sim- 
ple duplication-divergence mechanisms, concluding that the 
abundance of a motif is merely a by-product of the emergence 
of these aggregated motif structures. On the other hand, a 
purely topological analysis in hierarchical scale-free random 
networks has shown that the only way to distribute a large 
number of motifs over a comparatively small number of nodes 
is to aggregate the motifs around the network hubs, indicating 
that motif aggregation follows from motif enrichment. 

Here we introduced a novel method for assessing in a quan- 
titave way the statistical significance of network motif ag- 
gregation. Using an integrated network of transcriptional, 
posttranslational and protein-protein interactions in yeast, we 
showed that our method produces results which are consistent 
with deviations of the clustering coefficient distribution from 
the hierarchical scale-free model and correctly recapitulate 
previous findings. We showed furthermore that significant ag- 
gregation reflects local modularity of network motifs around 
nodes which are not necessarily hubs in the total network but 
play a core role in specific biological processes. Using a novel 
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Fig. 4 Global map of protein complex regulation through transcriptional and posttranslational regulatory interacting double-star clusters. Oval 
nodes are proteins, rectangular nodes are protein complexes which overlap significantly (P < 10~^) with a RI double-star cluster. Two-node 
mixed feedback loops indicate that the target hub protein belongs to the same complex as the spoke nodes. This figure is based on the data in 
Supplementary Tables S8 and S9. The interaction colors are the same as in Figure[2] 
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network motif clustering algorithm we found that if in ad- 
dition to aggregate significantly, a motif is also enriched, it 
likely plays a universal functional role across many biolog- 
ical processes. If the motif aggregates significantly without 
enrichment, it is likely specific for one or a few processes. 
We identified novel functional network themes for such mo- 
tifs, like a cell-cycle regulatory theme for posttranslationally 
controlled feedforward loops and a regulation of or by protein 
complexes for protein-interaction mediated regulatory loops. 

Our results show that network motif aggregation is an im- 
portant organizational principle of molecular interaction net- 
works which is independent of network motif enrichment and 
in particular is more robust against using heterogeneous ex- 
perimental sources of interaction data and less sensitive to 
the incompleteness of all current interaction networks. We 
hypothesize that network motif aggregation is the more fun- 
damental organizational principle and that network motif en- 
richment may follow from aggregation if the local modular- 
ity property extends across a sufficiently large fraction of bi- 
ological processes represented in the network at hand. Future 
work to support this hypothesis should be directed towards 
comparative analyses to confirm these results across multi- 
ple organisms and towards improving the current duplication- 
divergence models for network growth and hierarchical scale- 
free models for network topology in order to reproduce and 
understand the evolutionary origin of the complex modular 
organization observed in the integrated yeast interaction net- 
work. 



4.2 Network motif aggregation statistic 

For a given 3-node motif we choose a particular labeling of its 
nodes and define its aggregation statistic as 



N 



(2) 



where N is the total number of motif instances and n\, n2 and 
^3 are the number of nodes which participate at least once in 
a motif in node role 1, 2 and 3, respectively. ^ has the in- 
tuitive properties that it is higher (more aggregation) in a net- 
work with a fixed number of motif instances distributed over 
a smaller number of nodes or in a network with more motif 
instances distributed over a fixed number of nodes. We have 
N < ^i^2^3, and the maximum is attained for perfect topo- 
logical motif generalizations as in Figure [3]\ where all pos- 
sible motif instances are indeed present. The square root in 
eq. ^ ensures that y will be higher for bigger topological 
motif generalizations as well as for larger sets of nodes with 
a significant number of motif instances between them (as in 
Figure |3p), and is thus suitable to measure motif aggregation 
in noisy interaction data with potentially a large number of 
missing interactions. 

4.3 Local network motif aggregation score 

We define network motif clusters as subnetworks which lo- 
cally maximize the network motif aggregation statistic. To 
make this precise, for a given 3-node input motif, we define a 
3 -dimensional motif array T by 



4 Methods 



tijk ■ 



1 if there exists a motif instance between /, 7, k 
otherwise 



4.1 Network data 

The protein-protein interaction network (36391 interactions 
between 4847 proteins) was extracted from the BioGRID 
databasd^l ( |http : / /www . thebiogrid . org| ). The tran- 
scriptional regulatory network (11373 interactions between 
198 transcription factors and 3535 target genes) was ob- 
tained from Harbison et al.^ ( |http : //f raenkel .mit . | 
[edu/ Harbison/), using a P- value cutoff of 0.005. The 
posttranslational regulatory network (5630 interactions be- 
tween 264 regulators and 1653 target proteins) was ex- 
tracted from the BioGRID database (interactions annotated 
as 'biochemical activity') The bulk of this network 
(4621 interactions) comes from the phosphorylati on network 
of Ptacek et al.^ ( http : //n etworks . gersteinlab . " 
[org/phosphor ylome/| ). Self-interactions were not kept 
in any of the networks. 



where (ij^k) is any triple of nodes and the order of the in- 
dices corresponds to a particular labeling of the nodes in the 
motif. T can be constructed from the adjacency matrices of 
the networks defining the motif. For instance, the motif array 
for the feedforward loop in a directed network with adjacency 
matrix A is given by 



A motif cluster is now defined by three sets of nodes 
(Xi,X2,X3) (cfr. Figure [3| and its aggregation score can be 
written as 



y{x,,X2,X3)- 



T.ieXijeX2,keX3 ^ijk 



^\Xi\\X2\\X^\ 
where IXl denotes the number of nodes in X. 



(3) 
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4.4 Network motif clustering algorithm 



4.5 Network randomization algorithms 



We want to find (Xi,X2,X3) which maximize the local aggre 
gation score. To this end, we first find the best rank-1 approx 
imation to T^S, i.e. find real- valued vectors (w, v, w) maximiz 
ing 

HijkTijk^iVjWk 



M\\v\\\\w\\ 



(4) 



where ||w|| = yY^i^j is the length of u. These maximiz- 
ing vectors can be found efficiently by a multilinear power 
methodEl. For a set of nodes X we define an index vector ux 
by 



UXA 



|1 ifieX 
I otherwise 



such that ^ (Xi , X2 , X3 ) = ^(^ux^, ux2 , ) • This property is 
used to prove that for any Xi , X2, X3 



|^max-^(Xi,X2,X3)| 

< V2^^^^{\\u-ux^\\^\\v-ux2 



\\w-ux. 



(5) 



where ^max is the (unknown) maximal value of y over all 
possible node sets, ^max is the (known) maximal value of ^ 
over all real- valued vectors, (w, v, w) is the (known) best rank-1 
approximation to T, and all vectors on the r.h.s. are normal- 
ized to length 1. Using the fact that (t/,v,w) have nonnega- 
tive entries, it is trivial to find (Xi ,X2,X3) which minimize re- 



spectively \\u — uxi 



- 1 1 and 1 1 w — 1 1 , or equivalently. 



maximize (u^ux^), (v, Ux2 ) and (w, ux^ ) , where 



{u,ux) 



X\tx 



(6) 



is the overlap between u and ux. By eq. (|5|, these (Xi,X2,X3) 
are the best possible approximation to the highest scoring mo- 
tif cluster, given our knowledge of the best rank-1 approxima- 
tion to T. The r.h.s of eq. ^ gives a precise estimate on the 
quality of this approximation. Next we remove from the motif 
array T all entries corresponding to the motif instances in the 
highest scoring motif cluster. The procedure is repeated for 
this truncated motif array and iterated until no more non-zero 
entries remain, thus obtaining a partition of all motif instances 
into high-scoring motif clusters. 

For symmetric motifs (such as e.g. the coregulated interact- 
ing proteins motif, Figure[2j\(2)), the motif array is symmetric 
for interchanging two indices, Tfj^ = Ti^j for all triples (ij^k). 
The algorithm proceeds in the same way as before but ensures 
that a symmetric maximizer of eq. ^ is found with v = w, 
resulting in symmetric clusters with ^2=^3. 



To assess network motif enrichment we generated random net- 
works with the same incoming and outgoing degree distribu- 
tions as the real networks as follows. A directed network with 
Ne edges can be represented by two index vectors {/,/} of 
length Ne such that the ^th edge points from node 4 to node 4. 
We first generated a random permutation 7r(/) of the indices in 
/. The network represented by {/, 7r(/)} automatically has the 
same in- and out-degrees as the original network, except there 
may be some unwanted self-interactions, i.e. indices k where 
Ik = 7z{J)k- We swap these 7z{J)k with a randomly chosen en- 
try 7t{J)i until we obtain a corrected permutation 7tc{J) with- 
out any self-interactions and a corresponding randomized net- 
work {/, 7Zc{J)}- Undirected networks are treated in the same 
way except we choose 4 < Jk for every k. After randomly 
permuting / and correcting for self-interactions as before, we 
swap columns for every 4 > ^{J)k and correct for any dupli- 
cate edges by again randomly swapping entries in 71 {J). This 
randomization strategy is easier to implement and runs faster 
than the conventional single edge swapping strategies. 

To assess network motif aggregation we generated random 
networks with the same incoming and outgoing degree distri- 
butions as well as the same total motif count as the real net- 
works as follows. First we generate random networks with the 
same in- and out-degrees as described in the previous para- 
graph. If the total number of motif instances is smaller in the 
random network than the real one, we generate the list of all 
'incomplete' motif instances, triplets of nodes with two motif 
interactions present and one absent (Figure[5}\). We randomly 
select one of these incomplete motifs. For the two nodes with 
missing edge between them, we randomly select an incoming, 
resp. outgoing edge. We then swap the endpoints of these 
edges to 'close' the incomplete motif while preserving the de- 
gree distributions (Figure [5j A^B). If the total number of mo- 
tif instances is larger in the random network than the real one, 
we randomly select a motif instance and edge in that motif 
(Figure |5p). We randomly select another edge not belonging 
to any motif instance and swap the endpoints of the motif edge 
with this edge to 'open' a motif instance (Figure[5] B^A). We 
iterate between closing or opening motif instances until the to- 
tal number of motif instances is equal between the random and 
real network. 

4.6 Network motif enrichment and aggregation signifi- 
cance 

To compute network motif enrichment significance, we gener- 
ated 1000 random networks with the same in- and out-degree 
distributions as the real transcriptional, posttranslational and 
protein-protein interaction networks. The enrichment P- value 
is defined as the fraction of random networks having at least 
the same number of motif instances as the real networks, and 
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Fig. 5 Example of edge swapping operations to increase (A^B) or 
decrease (B^A) the number of FFLs in a random network while 
keeping the in- and out-degree distributions constant.. 



the Z-score is defined as 



N-il 



where N is the number of motif instances in the real network 
and /i, resp. cr, is the mean, resp. standard deviation, of the 
number of motif instances in the random network ensemble. 
To compute network motif aggregation significance, we gener- 
ated for each input motif 1000 random networks with the same 
in- and out-degree distributions as well as the same total mo- 
tif count as the transcriptional, posttranslational and protein- 
protein interaction networks. The aggregation P- value is de- 
fined as the fraction of random networks having at least the 
same aggregation statistic as the real network, and the Z-score 
is defined as in the previous paragraph. Notice that enrich- 
ment can be computed for all composite motifs using a single 
ensemble of integrated random networks. On the other hand, 
to compute aggregation, a separate random network ensemble 
has to be generated for each input motif. 

4.7 Software 

A Network Motif Clustering Toolbox containing an im- 
plementation of the network motif clustering algorithm as 
well as functions to generate random networks and com- 
pute network motif enrichment and aggregation significance 
is freely available for academic purposes, including source 
code, from |http : / / omics . f rias . uni - f reiburg . | 
|de/so f tware/| The toolbox operates under both Mat- 
lab ( ht tp : / /www . ma t hworks . com) and Octave ( http : ^ 
|//www . gnu . org/ software /octave| ). 
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